Origin of Correlations between Local Conformational States of Consecutive Amino Acid Residues and Their Role in Shaping Protein Structures and in Allostery

By analyzing the Kubo-cluster-cumulant expansion of the potential of mean force of polypeptide chains corresponding to backbone-local interactions averaged over the rotation of the peptide groups about the Cα···Cα virtual bonds, we identified two important kinds of “along-chain” correlations that pertain to extended chain segments bordered by turns (usually the β-strands) and to the folded spring-like segments (usually α-helices), respectively, and are expressed as multitorsional potentials. These terms affect the positioning of structural elements with respect to each other and, consequently, contribute to determining their packing. Additionally, for extended chain segments, the correlation terms contribute to propagating the conformational change at one end to the other end, which is characteristic of allosteric interactions. We confirmed both findings by statistical analysis of the virtual-bond geometry of 77 950 proteins. Augmenting coarse-grained and, possibly, all-atom force fields with these correlation terms could improve their capacity to model protein structure and dynamics.


Derivation of the lowest-order term in multitorsional potentials
Following the methodology presented in ref 1, the lowest-order Kubo cluster cumulant 2 in the expansion of the generic (m − 2)nd-order contribution (i.e., encompassing the whole chain segment under consideration) to the potential of mean force of the local interactions of an m-residue segment of a polypeptide chain, from C α k to C α k+m−1 , is expressed by eq S1.
f k,m k ;k+1,m k+1 g k+1,m k+1 ;k+2,m k+2 . . . g k+m−4,m k+m−4 ;k+m−3,m k+m−3 f k+m−3,m k+m−3 ;k+m−2,m k+m−2 (S1) where the terms f iα;jβ (λ i ), and g iα;jβ (λ i , λ j ) are the terms of the expression of the distance between atom with index α of unit i and that with index β of unit j in the angles λ i or λ i and λ j for the rotation of those atoms about the respective virtual-bond axes, as expressed by eqs 32 -42 and illustrated in Figure 2 of ref 1, and · · · denotes the average over all λs. The coefficients c i,m i , i = 1, 2, . . . , k + m − 2 depend on the derivatives of the energy of interactions between the respective atoms on distance (cf eq 43 of ref 1).
Because the f s and the gs correspond to adjacent units, they can be expressed by eqs S2 -S4, respectively (see also eqs 77 -79 of ref 1).
where the coefficients a iα;i+1,β and b iα;i+1,β depend on the location of atoms with local indices α and β with respect to the origin and virtual-bond axis of the unit, φ i+1,α is the angle of the rotation of atom with local index α of unit i about the virtual-bond axis in the reference configuration (for planar units φ is 0 or 180 • ), and Ψ i+1,i is the angle of the rotation of the virtual-bond axis of unit i + 1 about the virtual-bond axis of unit i (see It should be noted that, apart from those of eq S1, expressions corresponding to the m-bead segments of polypeptide chains that contain the f -terms inside the sequence also occur. However, because each of the consecutive f -term inside the sequence depends on a single λ only, these expressions are products of the components of U m for shorter chain segments, as shown by eq S5 for an example. Therefore, they will get subtracted in the expression for the generic (m − 2)-nd order cumulant (see eq 15 in ref 1). On the other hand, the products of the multi-torsional terms corresponding to consecutive chain segments could appear in the force-field expressions for multitorsional potentials (see for example eq 90 in ref 1 that expresses double-torsional potentials), because they correspond to higher-order terms in the cumulant expansion for shorter chain segments.

S2
For clarity sake, the atom indices m i are dropped in eq S5.
The components of U m are expressed by eq S6 (for clarity sake, the atom indices m i are dropped).
By using the Euler formula to express the cosines in terms of imaginary exponentials, which reduces the integration to finding all exponential terms with zero coefficients at λs the integrals of eq S7 are expressed by eq S9.
Plugging eq S9 into eq S6 and then, after reintroducing atom indices, eq S6 into eq S7 S3 and, finally applying recursively the reduction formulas for cosines of different angles to get the same phase angles in all terms, canceling the (−1) m−3 factors, and defining we finally obtain eq 3 of the main text.

References
(    Figure S2: Heat maps of the 2D-distributions (expressed as probability per degree 2 ) (A, C, and E) and correlations (B, D, and F) of γ N and γ C for 5-residue ETB segments of proteins chains not containing glycine or proline residues, derived assuming the cut-off for the central θ angle, θ cut = 120 • (A), θ cut = 135 • (B) and for the NETB segments of protein chains not containing proline or glycine residues (C). The unit of the color scale is 10 −5 on panels A, C, and E and 1 on panels B, D, and F, respectively. The plots were made with GRI. n=7 Figure S3: Potentials of mean force in the sum of virtual-bond-dihedral angles γ (Γ) along chain segments with n consecutive backbone-virtual-bond dihedrals for the ETB segments of protein chains (filled red circles and solid red lines) and NETB segments of protein chains (filled green triangles and green dashed lines) not containing glycine or proline residues. The lines are the C-splines linking the points. The plot was made with gnuplot. n > 20 C Figure S4: Heat maps of the 2D-distributions (expressed as probability per degree 2 ) of the γ N and γ C for (A) folded (FD), (B) folded helical (FH), and (C) non-structured (NS) segments of protein chains not containing glycine or proline residues for three ranges of the number of dihedrals (n) contained in a segment. The unit of the color scale is 10 −5 . The plots were made with GRI.  n > 20 D Figure S6: Heat maps of the 2D-distributions (expressed as probability per degree 2 ) of the γ N and γ C corresponding to the CASP14 UNRES group models (A and B) and CASP14 UNRES-template groups models (C and D) for the folded (FD) (A and C) and folded helical (FH) (B and D) chain segments for three ranges of the number of dihedrals (n) contained in a segment. The unit of the color scale is 10 −5 . The plots were made with GRI.  Figure S7: Heat maps of the 2D-distributions (expressed as probability per degree 2 ) (A, C, E, G) and correlations (B, D, F, H) of γ N and γ C for 5-residue ETB segments from 1,101 allosteric proteins (A and B) and three sets of 1,101 proteins each, selected at random from the PDB. The distributions were derived assuming the cut-off for the central θ angle, θ cut = 135 • . The unit of the color scale is 10 −5 . The plots were made with GRI. S11